Central North Sea
BEAVER: An Efficient Deterministic LLM Verifier
Suresh, Tarun, Wadhwa, Nalin, Banerjee, Debangshu, Singh, Gagandeep
As large language models (LLMs) transition from research prototypes to production systems, practitioners often need reliable methods to verify that model outputs satisfy required constraints. While sampling-based estimates provide an intuition of model behavior, they offer no sound guarantees. We present BEAVER, the first practical framework for computing deterministic, sound probability bounds on LLM constraint satisfaction. Given any prefix-closed semantic constraint, BEAVER systematically explores the generation space using novel token trie and frontier data structures, maintaining provably sound bounds at every iteration. We formalize the verification problem, prove soundness of our approach, and evaluate BEAVER on correctness verification, privacy verification and secure code generation tasks across multiple state of the art LLMs. BEAVER achieves 6 to 8 times tighter probability bounds and identifies 3 to 4 times more high risk instances compared to baseline methods under identical computational budgets, enabling precise characterization and risk assessment that loose bounds or empirical evaluation cannot provide.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (4 more...)
Automating the Refinement of Reinforcement Learning Specifications
Ambadkar, Tanmay, Žikelić, Đorđe, Verma, Abhinav
Logical specifications have been shown to help reinforcement learning algorithms in achieving complex tasks. However, when a task is under-specified, agents might fail to learn useful policies. In this work, we explore the possibility of improving coarse-grained logical specifications via an exploration-guided strategy. We propose \textsc{AutoSpec}, a framework that searches for a logical specification refinement whose satisfaction implies satisfaction of the original specification, but which provides additional guidance therefore making it easier for reinforcement learning algorithms to learn useful policies. \textsc{AutoSpec} is applicable to reinforcement learning tasks specified via the SpectRL specification logic. We exploit the compositional nature of specifications written in SpectRL, and design four refinement procedures that modify the abstract graph of the specification by either refining its existing edge specifications or by introducing new edge specifications. We prove that all four procedures maintain specification soundness, i.e. any trajectory satisfying the refined specification also satisfies the original. We then show how \textsc{AutoSpec} can be integrated with existing reinforcement learning algorithms for learning policies from logical specifications. Our experiments demonstrate that \textsc{AutoSpec} yields promising improvements in terms of the complexity of control tasks that can be solved, when refined logical specifications produced by \textsc{AutoSpec} are utilized.
- North America > United States > Pennsylvania (0.04)
- Asia > Middle East > Republic of Türkiye > Aksaray Province > Aksaray (0.04)
- Europe > United Kingdom > North Sea > Central North Sea (0.04)
- (11 more...)
Space Explanations of Neural Network Classification
Labbaf, Faezeh, Kolárik, Tomáš, Blicha, Martin, Fedyukovich, Grigory, Wand, Michael, Sharygina, Natasha
Explainability of decision-making AI systems (XAI), and specifically neural networks (NNs), is a key requirement for deploying AI in sensitive areas [18]. A recent trend in explaining NNs is based on formal methods and logic, providing explanations for the decisions of machine learning systems [24, 31, 32, 41, 42, 44] accompanied by provable guarantees regarding their correctness. Yet, rigorous exploration of the continuous feature space requires to estimate decision boundaries with complex shapes. This, however, remains a challenge because existing explanations [24, 31, 32, 41, 42, 44] constrain only individual features and hence fail capturing relationships among the features that are essential to understand the reasons behind the multi-parametrized classification process. We address the need to provide interpretations of NN systems that are as meaningful as possible using a novel concept of Space Explanations, delivered by a flexible symbolic reasoning framework where Craig interpolation [12] is at the heart of the machinery.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Austria > Vienna (0.14)
- Europe > Switzerland > Zürich > Zürich (0.14)
- (32 more...)
- Research Report > Promising Solution (0.48)
- Instructional Material > Course Syllabus & Notes (0.32)
Extracting Robust Register Automata from Neural Networks over Data Sequences
Hong, Chih-Duo, Jiang, Hongjian, Lin, Anthony W., Markgraf, Oliver, Parsert, Julian, Tan, Tony
Automata extraction is a method for synthesising interpretable surrogates for black-box neural models that can be analysed symbolically. Existing techniques assume a finite input alphabet, and thus are not directly applicable to data sequences drawn from continuous domains. We address this challenge with deterministic register automata (DRAs), which extend finite automata with registers that store and compare numeric values. Our main contribution is a framework for robust DRA extraction from black-box models: we develop a polynomial-time robustness checker for DRAs with a fixed number of registers, and combine it with passive and active automata learning algorithms. This combination yields surrogate DRAs with statistical robustness and equivalence guarantees. As a key application, we use the extracted automata to assess the robustness of neural networks: for a given sequence and distance metric, the DRA either certifies local robustness or produces a concrete counterexample. Experiments on recurrent neural networks and transformer architectures show that our framework reliably learns accurate automata and enables principled robustness evaluation. Overall, our results demonstrate that robust DRA extraction effectively bridges neural network interpretability and formal reasoning without requiring white-box access to the underlying network.
- North America > United States > New York > New York County > New York City (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- (30 more...)
- North America > United States > Michigan (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > United Kingdom > North Sea > Central North Sea (0.04)
- (3 more...)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > Oregon > Multnomah County > Portland (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (8 more...)
- Education (0.46)
- Leisure & Entertainment > Sports (0.46)
Building Trustworthy AI by Addressing its 16+2 Desiderata with Goal-Directed Commonsense Reasoning
Tudor, Alexis R., Zeng, Yankai, Wang, Huaduo, Arias, Joaquin, Gupta, Gopal
Current advances in AI and its applicability have highlighted the need to ensure its trustworthiness for legal, ethical, and even commercial reasons. Sub-symbolic machine learning algorithms, such as the LLMs, simulate reasoning but hallucinate and their decisions cannot be explained or audited (crucial aspects for trustworthiness). On the other hand, rule-based reasoners, such as Cyc, are able to provide the chain of reasoning steps but are complex and use a large number of reasoners. We propose a middle ground using s(CASP), a goal-directed constraint-based answer set programming reasoner that employs a small number of mechanisms to emulate reliable and explainable human-style commonsense reasoning. In this paper, we explain how s(CASP) supports the 16 desiderata for trustworthy AI introduced by Doug Lenat and Gary Marcus (2023), and two additional ones: inconsistency detection and the assumption of alternative worlds. To illustrate the feasibility and synergies of s(CASP), we present a range of diverse applications, including a conversational chatbot and a virtually embodied reasoner.
- Europe > Sweden (0.04)
- North America > United States > Texas > Dallas County > Dallas (0.04)
- Europe > United Kingdom > North Sea > Central North Sea (0.04)
- (4 more...)
- Law (1.00)
- Health & Medicine (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (1.00)
- (3 more...)
Follow the STARs: Dynamic $ω$-Regular Shielding of Learned Policies
Anand, Ashwani, Nayak, Satya Prakash, Raha, Ritam, Schmuck, Anne-Kathrin
This paper presents a novel dynamic post-shielding framework that enforces the full class of $ω$-regular correctness properties over pre-computed probabilistic policies. This constitutes a paradigm shift from the predominant setting of safety-shielding -- i.e., ensuring that nothing bad ever happens -- to a shielding process that additionally enforces liveness -- i.e., ensures that something good eventually happens. At the core, our method uses Strategy-Template-based Adaptive Runtime Shields (STARs), which leverage permissive strategy templates to enable post-shielding with minimal interference. As its main feature, STARs introduce a mechanism to dynamically control interference, allowing a tunable enforcement parameter to balance formal obligations and task-specific behavior at runtime. This allows to trigger more aggressive enforcement when needed, while allowing for optimized policy choices otherwise. In addition, STARs support runtime adaptation to changing specifications or actuator failures, making them especially suited for cyber-physical applications. We evaluate STARs on a mobile robot benchmark to demonstrate their controllable interference when enforcing (incrementally updated) $ω$-regular correctness properties over learned probabilistic policies.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Germany > Rhineland-Palatinate > Kaiserslautern (0.04)
- North America > United States > New York > Richmond County > New York City (0.04)
- (9 more...)
Large Language Models Imitate Logical Reasoning, but at what Cost?
McGinness, Lachlan, Baumgartner, Peter
We present a longitudinal study which evaluates the reasoning capability of frontier Large Language Models over an eighteen month period. We measured the accuracy of three leading models from December 2023, September 2024 and June 2025 on true or false questions from the PrOntoQA dataset and their faithfulness to reasoning strategies provided through in-context learning. The improvement in performance from 2023 to 2024 can be attributed to hidden Chain of Thought prompting. The introduction of thinking models allowed for significant improvement in model performance between 2024 and 2025. We then present a neuro-symbolic architecture which uses LLMs of less than 15 billion parameters to translate the problems into a standardised form. We then parse the standardised forms of the problems into a program to be solved by Z3, an SMT solver, to determine the satisfiability of the query. We report the number of prompt and completion tokens as well as the computational cost in FLOPs for open source models. The neuro-symbolic approach significantly reduces the computational cost while maintaining near perfect performance. The common approximation that the number of inference FLOPs is double the product of the active parameters and total tokens was accurate within 10\% for all experiments.
- North America > United States (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Europe > United Kingdom > North Sea > Central North Sea (0.04)
- (3 more...)
Reinforcement Learning for Robust Ageing-Aware Control of Li-ion Battery Systems with Data-Driven Formal Verification
Coppola, Rudi, Touloujian, Hovsep, Ombrini, Pierfrancesco, Mazo, Manuel Jr
Rechargeable lithium-ion (Li-ion) batteries are a ubiquitous element of modern technology. In the last decades, the production and design of such batteries and their adjacent embedded charging and safety protocols, denoted by Battery Management Systems (BMS), has taken central stage. A fundamental challenge to be addressed is the trade-off between the speed of charging and the ageing behavior, resulting in the loss of capacity in the battery cell. We rely on a high-fidelity physics-based battery model and propose an approach to data-driven charging and safety protocol design. Following a Counterexample-Guided Inductive Synthesis scheme, we combine Reinforcement Learning (RL) with recent developments in data-driven formal methods to obtain a hybrid control strategy: RL is used to synthesise the individual controllers, and a data-driven abstraction guides their partitioning into a switched structure, depending on the initial output measurements of the battery. The resulting discrete selection among RL-based controllers, coupled with the continuous battery dynamics, realises a hybrid system. When a design meets the desired criteria, the abstraction provides probabilistic guarantees on the closed-loop performance of the cell.
- Europe > Netherlands > South Holland > Delft (0.04)
- Europe > Netherlands > North Brabant > Eindhoven (0.04)
- Europe > United Kingdom > North Sea > Central North Sea (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- Energy > Energy Storage (1.00)
- Electrical Industrial Apparatus (1.00)